Implement double config for AdaptiveCrawler#1683
Implement double config for AdaptiveCrawler#1683Vaccarini-Lorenzo wants to merge 2 commits intounclecode:developfrom
Conversation
|
@Vaccarini-Lorenzo Runtime bugs:
Behavioral change:
Minor:
Could you take a look at these? The main fixes needed are adding the missing |
…ion (#1682) The embedding strategy uses two incompatible API call types: embedding calls (text-to-vector) and query expansion (chat completion). Previously both used a single embedding_llm_config, so setting an embedding model broke query expansion and vice versa. Add query_llm_config to AdaptiveConfig and EmbeddingStrategy so users can specify separate models for each call type. Fallback chain preserves backward compatibility: query_llm_config -> llm_config -> hardcoded defaults. Also fixes base_url and backoff params not being passed to perform_completion_with_backoff in query expansion, and simplifies _embedding_llm_config_dict to use LLMConfig.to_dict() (which includes the 3 backoff fields the manual extraction was missing). Inspired by PR #1683 from @sthakrar — thank you for identifying the issue and proposing the initial approach.
|
Hey @Vaccarini-Lorenzo - thank you for filing #1682 and this PR! You identified a real design gap and the We went ahead and landed this in
The target API is exactly what you proposed: AdaptiveConfig(
embedding_llm_config=LLMConfig(provider='openai/text-embedding-3-small'),
query_llm_config=LLMConfig(provider='openai/gpt-4o-mini'),
)Since the fix is already on |
|
Hi @unclecode P.s. |
|
@Vaccarini-Lorenzo Good catch on the typo - that's embarrassing! I'll fix the commit message to properly credit you instead of @sthakrar. Apologies for the mixup. And thanks again for the contribution - the separate By the way - we're building out Crawl4AI Cloud and starting paid collaborations with contributors who know the system well. If that's something you'd be interested in, send an email to aravind@crawl4ai.com (cc: unclecode@crawl4ai.com) and we can chat. |
…ion (#1682) The embedding strategy uses two incompatible API call types: embedding calls (text-to-vector) and query expansion (chat completion). Previously both used a single embedding_llm_config, so setting an embedding model broke query expansion and vice versa. Add query_llm_config to AdaptiveConfig and EmbeddingStrategy so users can specify separate models for each call type. Fallback chain preserves backward compatibility: query_llm_config -> llm_config -> hardcoded defaults. Also fixes base_url and backoff params not being passed to perform_completion_with_backoff in query expansion, and simplifies _embedding_llm_config_dict to use LLMConfig.to_dict() (which includes the 3 backoff fields the manual extraction was missing). Inspired by PR #1683 from @Vaccarini-Lorenzo — thank you for identifying the issue and proposing the initial approach.
|
@Vaccarini-Lorenzo Fixed! The commit message now correctly credits you. Thanks for flagging it. |
Summary
Proposed solution to fix Issue #1682
List of files changed and why
File impacted: adaptive_crawler.py
AdaptiveConfig now supports two configs, one for embeddings and one for chat completion API
The configs support provider, base_url and api_token.
How Has This Been Tested?
This is not a breaking change, it just adds an additional config option.
In realtion to Issue #1682 , the new proposed approach would be
Checklist: